A Speech Corpus for Modeling Language Acquisition: CAREGIVER

نویسندگان

Toomas Altosaar

Louis ten Bosch

Guillaume Aimetti

Christos Koniaris

Kris Demuynck

Henk van den Heuvel

چکیده

A multi-lingual speech corpus used for modeling language acquisition called CAREGIVER has been designed and recorded within the framework of the EU funded Acquisition of Communication and Recognition Skills (ACORNS) project. The paper describes the motivation behind the corpus and its design by relying on current knowledge regarding infant language acquisition. Instead of recording infants and children, the voices of their primary and secondary caregivers were captured in both infant-directed and adultdirected speech modes over four languages in a read speech manner. The challenges and methods applied to obtain similar prompts in terms of complexity and semantics across different languages, as well as the normalized recording procedures employed at different locations, is covered. The corpus contains nearly 66000 utterance based audio files spoken over a two-year period by 17 male and 17 female native speakers of Dutch, English, Finnish, and Swedish. An orthographical transcription is available for every utterance. Also, time-aligned word and phone annotations for many of the sub-corpora also exist. The CAREGIVER corpus will be published via ELRA.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Allophone-based acoustic modeling for Persian phoneme recognition

Phoneme recognition is one of the fundamental phases of automatic speech recognition. Coarticulation which refers to the integration of sounds, is one of the important obstacles in phoneme recognition. In other words, each phone is influenced and changed by the characteristics of its neighbor phones, and coarticulation is responsible for most of these changes. The idea of modeling the effects o...

متن کامل

Language model acquisition from a text corpus for speech understanding

Speech understanding can be viewed as a problem of translating input natural language of speech recognition results into output semantic language. This paper describes automatic acquisition of a language model for translating natural language into semantic language from a text corpus using a stochastic method. The method estimates co-occurrence probabilities of input and output grammar rules as...

متن کامل

Effects of Caregiver Prosody on Child Language Acquisition

This paper investigates the role of prosody in one child’s lexical acquisition using an ecologically valid, high-density, longitudinal corpus. The corpus consists of high fidelity recordings collected from microphones embedded throughout the home of a family with a young child. We analyze data collected continuously from ages 9 – 24 months, including the child’s first productive use of language...

متن کامل

Visually Grounded Virtual Accelerometers A Longitudinal Video Investigation of Dyadic Bodily Dynamics around the time of Word Acquisition by

Human movement encodes information about internal states and goals. When these goals involve dyadic interactions, such as in language acquisition, the nature of the movement and proximity become representative, allowing parts of our internal states to manifest. We propose an approach called Visually Grounded Virtual Accelerometers (VGVA), to aid with ecologically-valid video analysis investigat...

متن کامل

First steps in building a large vocabulary continuous speech recognition system for Vietnamese

This paper presents an overview of our activities for building a Large Vocabulary Continuous Speech Recognition (LVCSR) system for Vietnamese implemented at CLIPS-IMAG Laboratory (France) and International Research Center MICA (Vietnam). Firstly, a new methodology for fast text corpora acquisition for minority languages which has been applied to Vietnamese is proposed. Secondly, the first resul...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2010

A Speech Corpus for Modeling Language Acquisition: CAREGIVER

نویسندگان

چکیده

منابع مشابه

Allophone-based acoustic modeling for Persian phoneme recognition

Language model acquisition from a text corpus for speech understanding

Effects of Caregiver Prosody on Child Language Acquisition

Visually Grounded Virtual Accelerometers A Longitudinal Video Investigation of Dyadic Bodily Dynamics around the time of Word Acquisition by

First steps in building a large vocabulary continuous speech recognition system for Vietnamese

عنوان ژورنال:

اشتراک گذاری